low-rank constraint
- Europe > Switzerland (0.04)
- Asia > Thailand (0.04)
- Oceania > Australia (0.04)
- (2 more...)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Leisure & Entertainment (0.48)
- Media > Music (0.30)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.98)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Low-Rank Constraints for Fast Inference in Structured Models
Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) require time and space quadratic and cubic in the number of hidden states respectively. This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. We show that by viewing the central inference step as a matrix-vector product and using a low-rank constraint, we can trade off model expressivity and speed via the rank. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces while providing practical speedups.
- Media > Music (0.61)
- Leisure & Entertainment (0.61)
- Europe > Switzerland (0.04)
- Asia > Thailand (0.04)
- Oceania > Australia (0.04)
- (2 more...)
A Expressivity of Low-Rank Models
We focus on the simplest case of HMMs for an analysis of expressivity. Next, we show that there does not exist a 2-state HMM that can have this marginal distribution. We start by setting up a system of equations. We provide the low-rank hypergraph marginalization algorithms for HMMs and PCFGs in Alg. 4, with loops over labels Words outside of the vocabulary are mapped to the UNK token. The dataset lengths are given in Table 3.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Leisure & Entertainment (0.48)
- Media > Music (0.30)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.98)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Chen, Xi, Feng, Kaituo, Li, Changsheng, Lai, Xunhao, Yue, Xiangyu, Yuan, Ye, Wang, Guoren
Low-rank training has emerged as a promising approach for reducing memory usage in training Large Language Models (LLMs). Previous methods either rely on decomposing weight matrices (e.g., LoRA), or seek to decompose gradient matrices (e.g., GaLore) to ensure reduced memory consumption. However, both of them constrain the training in a low-rank subspace, thus inevitably leading to sub-optimal performance. This raises a question: whether it is possible to consistently preserve the low-rank constraint for memory efficiency, while achieving full-rank training (i.e., training with full-rank gradients of full-rank weights) to avoid inferior outcomes? In this paper, we propose a new plug-and-play training framework for LLMs called Fira, as the first attempt to achieve this goal. First, we observe an interesting phenomenon during LLM training: the scaling impact of adaptive optimizers (e.g., Adam) on the gradient norm remains similar from low-rank to full-rank training. Based on this observation, we propose a norm-based scaling method, which utilizes the scaling impact of low-rank optimizers as substitutes for that of original full-rank optimizers to enable full-rank training. In this way, we can preserve the low-rank constraint in the optimizer while achieving full-rank training for better performance. Moreover, we find that there are sudden gradient rises during the optimization process, potentially causing loss spikes. To address this, we further put forward a norm-growth limiter to smooth the gradient via regulating the relative increase of gradient norms. Extensive experiments on the pre-training and fine-tuning of LLMs show that Fira outperforms both LoRA and GaLore, achieving performance that is comparable to or even better than full-rank training.
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Hebei Province (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.70)
Low-Rank Constraints for Fast Inference in Structured Models
Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) require time and space quadratic and cubic in the number of hidden states respectively. This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. We show that by viewing the central inference step as a matrix-vector product and using a low-rank constraint, we can trade off model expressivity and speed via the rank. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces while providing practical speedups.
- Media > Music (0.65)
- Leisure & Entertainment (0.65)
Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling
Shi, Jiawei, Deng, Hui, Dai, Yuchao
Even though Non-rigid Structure-from-Motion (NRSfM) has been extensively studied and great progress has been made, there are still key challenges that hinder their broad real-world applications: 1) the inherent motion/rotation ambiguity requires either explicit camera motion recovery with extra constraint or complex Procrustean Alignment; 2) existing low-rank modeling of the global shape can over-penalize drastic deformations in the 3D shape sequence. This paper proposes to resolve the above issues from a spatial-temporal modeling perspective. First, we propose a novel Temporally-smooth Procrustean Alignment module that estimates 3D deforming shapes and adjusts the camera motion by aligning the 3D shape sequence consecutively. Our new alignment module remedies the requirement of complex reference 3D shape during alignment, which is more conductive to non-isotropic deformation modeling. Second, we propose a spatial-weighted approach to enforce the low-rank constraint adaptively at different locations to accommodate drastic spatially-variant deformation reconstruction better. Our modeling outperform existing low-rank based methods, and extensive experiments across different datasets validate the effectiveness of our method.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China (0.04)